Skip to content

fix: Keep track of synthetic apify-default-dataset-item events#814

Open
Mantisus wants to merge 10 commits intoapify:masterfrom
Mantisus:apify-default-dataset-item-event
Open

fix: Keep track of synthetic apify-default-dataset-item events#814
Mantisus wants to merge 10 commits intoapify:masterfrom
Mantisus:apify-default-dataset-item-event

Conversation

@Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Feb 26, 2026

Description

  • Add _DatasetClientPPEMixin to dataset storage clients to automatically charge for the synthetic apify-default-dataset-item event on every push to the default dataset.
  • Update Actor.push_data() to account for the combined explicit + synthetic event price when enforcing budget limits.

Issues

Testing

  • Add new unit and E2E test.

@Mantisus Mantisus requested a review from janbuchar February 26, 2026 00:46
@Mantisus Mantisus self-assigned this Feb 26, 2026
Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments 🙂

@janbuchar
Copy link
Contributor

FYI - apify/apify-sdk-js#570 - this should not be a problem in the Python version, but please make sure.

"""Context manager to acquire the charge lock if PPE charging manager is active."""
charging_manager = charging_manager_ctx.get()
if charging_manager:
if charging_manager.charge_lock.locked():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, asyncio.Lock.locked() tells whether any coroutine currently holds the lock. Doesn't that mean we would skip acquiring the lock and run unsynchronized? That would defeat the purpose of the lock in the first place. Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Actor.push_data is called, a lock is set, and Dataset.push_data is called inside it, which leads to this internal lock. Without this check, it will cause a deadlock; the flow will wait for the lock to be unlocked, which itself is blocked.
When calling Dataset.push_data directly, the lock will be set as usual.

Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last comments were sent. Could you @janbuchar now please check it as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants